Detection of disfluencies in speech signal
نویسندگان
چکیده
During public presentations or interviews, speakers commonly and unconsciously abuse interjections or filled pauses that interfere with speech fluency and negatively affect listeners impression and speech perception. Types of disfluencies and methods of detection are reviewed. Authors carried out a survey which results indicated the most adverse elements for audience. The article presents an approach to automatic detection of the most common type of disfluencies filled pauses. A base of patterns of filled pauses (prolongated I, prolongated e, mm, Im, xmm, using SAMPA notation) was collected from 72 minutes of recordings of public presentations and interviews of six speakers (3 male, 3 female). Statistical analysis of length and frequency of occurrence of such interjections in recordings are presented. Then, each pattern from training set was described with mean values of first and second formants (F1 and F2). Detection was performed on test set of recordings by recognizing the phonemes using the two formants with efficiency of recognition about 68%. The results of research on disfluencies in speech detection may be applied in a system that analyzes speech and provides feedback of imperfections that occurred during speech in order to help in oratorical skills training. A conceptual prototype of such an application is proposed. Moreover, a base of patterns of most common disfluencies can be used in speech recognition systems to avoid interjections during speech-to-text transcription.
منابع مشابه
Automatic detection and annotation of disfluencies in spoken French corpora
In this paper we propose a multi-step system for the semiautomatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We pres...
متن کاملAutomatic Detection and Removal of Disfluencies from Spontaneous Speech
Unlike rehearsed and prepared speech, spontaneous speech contains high occurrence of disfluencies, like repetitions, filled pauses, and hesitations. Disfluencies can seriously hamper the word recognition accuracy of an Automatic Speech Recogniser (ASR), by increasing word insertion and deletion and rejection rates. In this paper we introduce signal processing algorithms to automatically identif...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملAcoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
Most automatic speech recognizers (ASRs) concentrate on read speech, which is different from spontaneous speech with disfluencies. ASRs cannot deal with speech with a high rate of disfluencies such as filled pauses, repetitions, lengthening, repairs, false starts and silence pauses. In this paper, we focus on the feature analysis and modeling of the filled pauses “ah,” “ung,” “um,” “em,” and “h...
متن کاملAutomatic Detection of Sentence Boundaries, Disfluencies, and Conversational Fillers in Spontaneous Speech
Automatic Detection of Sentence Boundaries, Disfluencies, and Conversational Fillers in Spontaneous Speech
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013